The Doubly Correlated Nonparametric Topic Model
نویسندگان
چکیده
Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata.
منابع مشابه
A New Nonparametric Regression for Longitudinal Data
In many area of medical research, a relation analysis between one response variable and some explanatory variables is desirable. Regression is the most common tool in this situation. If we have some assumptions for such normality for response variable, we could use it. In this paper we propose a nonparametric regression that does not have normality assumption for response variable and we focus ...
متن کاملThe IBP Compound Dirichlet Process and its Application to Focused Topic Modeling
The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model—each data point is modeled with a collection of components of different proportions. Though powerful, the HDP makes an assumption that the probability of a component being exhibited by a data point is positively correlated with its proportion within that data point. This might be an undesirable assumptio...
متن کاملThe finite sample performance of semi- and non-parametric estimators for treatment effects and policy evaluation
The Finite Sample Performance of Semiand Nonparametric Estimators for Treatment Effects and Policy Evaluation This paper investigates the finite sample performance of a comprehensive set of semiand nonparametric estimators for treatment and policy evaluation. In contrast to previous simulation studies which mostly considered semiparametric approaches relying on parametric propensity score estim...
متن کاملDoubly-nonparametric generalized linear models
We extend nonparametric generalized linear models to allow both the mean curve and the response distribution to be nonparametric. The seemingly intractable task of working with two infinite-dimensional parameters is shown to be reducible to a finite optimization problem, which is easily implemented via existing algorithms. We demonstrate using various examples that the proposed approach can be ...
متن کاملNonparametric Bayes Pachinko Allocation
Recent advances in topic models have explored complicated structured distributions to represent topic correlation. For example, the pachinko allocation model (PAM) captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). While PAM provides more flexibility and greater expressive power than previous models like latent Dirichlet allocation ...
متن کامل